Skip to content

Conversation

@xuxiong1
Copy link
Contributor

@xuxiong1 xuxiong1 commented Dec 4, 2025

Description

This commit introduces a feature to read translog operations in forward order
(oldest to newest) instead of the default backward order (newest to oldest).

Changes:

  • Add index.translog.read_forward setting (default: false) in IndexSettings
  • Update MultiSnapshot to support bidirectional reading based on setting

Tests:

  • testRecoveryTrimsLocalTranslogWithReadForward (RecoveryTests)
  • testSeqNoCollisionWithReadForward (IndexLevelReplicationTests)
  • testSnapshotReadOperationForward (LocalTranslogTests)

Related Issues

Resolves #20094

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Add a configurable index-level option to enable forward reading of transaction logs for snapshots and recoveries.
  • Improvements

    • Resync now advances translog generation before trimming and adjusts initial sync boundaries to improve recovery correctness.
  • Tests

    • Added tests validating forward translog reading across snapshots, replication/recovery flows, and recovery trimming.
  • Documentation

    • Updated changelog to announce forward translog read support.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 4, 2025

Walkthrough

Adds an index-level boolean to control translog read direction, threads it into Translog and MultiSnapshot to allow forward or backward snapshot iteration, changes translog generation handling before resync trimming, adjusts initial sync boundary computation, and adds tests exercising forward-read behavior.

Changes

Cohort / File(s) Summary
Changelog & Index settings
CHANGELOG.md, server/src/main/java/org/opensearch/index/IndexSettings.java
Added changelog line for forward translog reading and introduced INDEX_TRANSLOG_READ_FORWARD_SETTING (index-scoped boolean, default false), backing field translogReadForward, and accessor isTranslogReadForward().
Translog snapshot traversal
server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java
Constructor now takes boolean readForward; traversal uses a PrimitiveIterator.OfInt to iterate either forward (0..N-1) or backward (N-1..0) while preserving per-operation deduplication.
Translog integration
server/src/main/java/org/opensearch/index/translog/Translog.java
newMultiSnapshot(...) now forwards indexSettings().isTranslogReadForward() into the MultiSnapshot constructor (constructor signature changed).
Resync trimming ordering
server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java
When a resync request includes TrimAboveSeqNo, the replica now calls replica.rollTranslogGeneration() before trimming previously synchronized primary terms.
Primary/replica sync boundary
server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java
Initial resync boundary for the first batch changed to use startingSeqNo - 1 (instead of previous max-based value) when constructing ResyncReplicationRequest.
Replication & recovery tests
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java, server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java
Added testSeqNoCollisionWithReadForward() and testRecoveryTrimsLocalTranslogWithReadForward() to validate forward-read paths; one test addition is duplicated in the file.
Translog unit tests
server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java
Added testSnapshotReadOperationForward() to assert forward-enabled translog snapshots return operations in forward generation order.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Index Config
    participant Settings as IndexSettings
    participant Translog as Translog
    participant Multi as MultiSnapshot
    participant Files as Translog Files

    Config->>Settings: load index settings
    Settings->>Settings: read INDEX_TRANSLOG_READ_FORWARD_SETTING
    Translog->>Settings: isTranslogReadForward()
    Translog->>Multi: newMultiSnapshot(snapshots, onClose, readForward)
    alt readForward == true
        note right of Multi: iterator yields 0..N-1 (forward)
    else
        note right of Multi: iterator yields N-1..0 (backward)
    end
    loop replay snapshots
        Multi->>Files: read snapshot at iterator index
        Files-->>Multi: operations (per-file order preserved)
        Multi->>Multi: deduplicate by seqNo and yield operations
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • Areas needing extra attention:
    • MultiSnapshot: correctness of iterator initialization and preserved deduplication semantics for both directions.
    • Translog.newMultiSnapshot: propagation of the flag and onClose/error behavior.
    • TransportResyncReplicationAction: ordering of rollTranslogGeneration() relative to trimming.
    • PrimaryReplicaSyncer: correctness of the new initial boundary computation.
    • Tests: duplicated test in IndexLevelReplicationTests and deterministic behavior of forward-read tests.

Suggested labels

v3.4.0, backport 3.4

Suggested reviewers

  • sachinpkale
  • reta
  • mch2
  • msfroh
  • andrross
  • dbwiddis
  • kotwanikunal
  • shwetathareja
  • owaiskazi19
  • saratvemulapalli

Poem

🐰 I hopped through tlogs, from old to new,
Forward I bounded, then backward I knew.
Generations stitched in tidy array,
Seq numbers steady, no stale in my way.
The rabbit nods — replay's set to play.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature added: support for forward translog reading, which matches the primary changeset.
Description check ✅ Passed The description covers the main changes, lists three new tests, references the resolved issue #20094, and includes completed checklist items as required by the template.
Linked Issues check ✅ Passed The PR addresses the core requirements from issue #20094: adds a configurable setting for forward translog reading [#20094], updates MultiSnapshot for bidirectional reading [#20094], and includes tests demonstrating the functionality [#20094].
Out of Scope Changes check ✅ Passed All code changes are directly related to enabling forward translog reading: new index setting, MultiSnapshot updates, translog changes, resync trimming adjustments, and comprehensive test coverage. No unrelated or out-of-scope changes detected.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79c1e0a and 78bc921.

📒 Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: Analyze (java)
🔇 Additional comments (1)
CHANGELOG.md (1)

41-41: Changelog entry looks good.

The missing closing parenthesis flagged in the previous review has been addressed, and the markdown link formatting is now correct.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep labels Dec 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

❌ Gradle check result for c968213: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)

638-704: Consider extracting common test logic.

The test duplicates 66 lines from testSeqNoCollision() (lines 571-636), differing only in the setting on line 646. While explicit duplication in tests aids clarity, you might consider a parameterized test or helper method to reduce maintenance overhead if the test logic evolves.

Example approach using a helper method:

public void testSeqNoCollision() throws Exception {
    testSeqNoCollisionWithSettings(Settings.builder()
        .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true)
        .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1")
        .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1")
        .build());
}

public void testSeqNoCollisionWithReadForward() throws Exception {
    testSeqNoCollisionWithSettings(Settings.builder()
        .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true)
        .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1")
        .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1")
        .put(IndexSettings.INDEX_TRANSLOG_READ_FORWARD_SETTING.getKey(), true)
        .build());
}

private void testSeqNoCollisionWithSettings(Settings settings) throws Exception {
    // Common test logic here
}
server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1)

3876-3934: Forward-read snapshot test wiring looks correct

The test correctly:

  • Constructs a translog with INDEX_TRANSLOG_READ_FORWARD_SETTING enabled via getTranslogConfig(tempDir, settings).
  • Uses a separate LocalTranslog instance so it doesn’t interfere with the class-level translog.
  • Populates views in generation order and asserts that newSnapshot() yields the concatenated operations in forward (oldest→newest) order.

This aligns with the new forward-reading MultiSnapshot semantics and gives good coverage for multi-generation snapshots.

As a minor optional clean-up, you could factor the common setup logic between this test and testSnapshotReadOperationInReverse into a small helper to reduce duplication, but it’s not strictly necessary here.

server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1)

533-572: Read-forward recovery test mirrors baseline behavior appropriately

This test cleanly mirrors testRecoveryTrimsLocalTranslog while:

  • Enabling INDEX_TRANSLOG_READ_FORWARD_SETTING on the index.
  • Creating the replication group with an InternalEngineFactory, keeping the execution model aligned with the baseline test.
  • Reusing the same flow (in-flight docs, replica promotion/demotion, recovery, and consistency assertions), which is exactly what we need to validate trimming semantics under forward translog reading.

The coverage looks solid and should catch regressions specific to read-forward mode.

If you want to reduce maintenance overhead later, you could extract the common body of this test and testRecoveryTrimsLocalTranslog into a shared helper that takes the Settings (or a boolean readForward) as a parameter, but this duplication is acceptable as-is.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0449ce8 and 234b0c9.

📒 Files selected for processing (7)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (4 hunks)
  • server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (2 hunks)
  • server/src/main/java/org/opensearch/index/translog/Translog.java (1 hunks)
  • server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1 hunks)
  • server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1 hunks)
  • server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
🔇 Additional comments (4)
server/src/main/java/org/opensearch/index/translog/Translog.java (1)

764-765: LGTM!

The implementation correctly reads the new isTranslogReadForward() setting from IndexSettings and passes it to the MultiSnapshot constructor, enabling bidirectional translog reading based on index configuration.

server/src/main/java/org/opensearch/index/IndexSettings.java (1)

914-914: Translog read‑forward flag wiring looks correct; confirm non‑dynamic lifecycle is intentional

The new translogReadForward flag is:

  • Backed by INDEX_TRANSLOG_READ_FORWARD_SETTING with default false.
  • Read once at construction time from settings.
  • Stored in a final field and exposed via isTranslogReadForward().

This means the read direction is effectively fixed for the lifetime of the IndexSettings instance and cannot be changed via a dynamic settings update (the setting also lacks Property.Dynamic and there is no update consumer).

If the intent is “configure at index creation (or close/open) only”, this is perfectly fine and keeps behavior simple. If you expect operators to toggle read‑forward on an already‑open index, you’d need to:

  • Mark the setting as Property.Dynamic.
  • Store it in a volatile field instead of final.
  • Register an update consumer on scopedSettings similar to other translog settings.

Given the sensitivity of recovery semantics, the non‑dynamic approach is probably safer, but it’s worth double‑checking that this matches your operational expectations.

Also applies to: 1135-1135, 2102-2107

server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (2)

57-70: Constructor wiring for readForward flag and index initialization looks sound

The added readForward flag and constructor wiring are consistent:

  • readForward is final, so direction is immutable per snapshot.
  • index = readForward ? 0 : translogs.length - 1; correctly handles both directions and the empty‑array case (0 vs -1 with loops guarding against out‑of‑bounds).

No correctness issues here.


84-111: Forward vs backward traversal shares correct dedupe semantics; relies on trim precondition

The new next() implementation cleanly bifurcates behavior:

  • Forward path (readForward == true): iterates index from 0 to translogs.length - 1, consuming each TranslogSnapshot in order and using seenSeqNo.getAndSet + overriddenOperations in the same way as before.
  • Backward path keeps the original behavior (from translogs.length - 1 down to 0) with identical per‑operation logic.

A few points worth noting:

  • The reuse of the same inner loop and SeqNoSet logic in both branches preserves the existing semantics of “first occurrence wins with respect to the chosen traversal order” and keeps overriddenOperations accounting correct.
  • With backward reading, “first occurrence” means “latest generation wins”, which avoids stale operations from older primary terms.
  • With forward reading, “first occurrence” means “oldest generation wins”. This matches the risk described in the new index setting Javadoc: if trimming of stale operations (trimOperationOfPreviousPrimaryTerms(...)) hasn’t happened yet, forward traversal can surface outdated ops from older primary terms.

Given that:

  • As long as forward reading is only enabled in flows where stale‑term trimming is guaranteed to have run before constructing this MultiSnapshot, this implementation is consistent with the documented behavior.
  • If you want extra safety, you could consider adding assertions or tighter invariants at the call site (e.g., around when forward snapshots are created) to ensure we don’t accidentally use forward traversal in a pre‑trim state, but that’s optional and outside this class.

Overall, the bidirectional iteration logic here is correct and symmetric.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 234b0c9 and 73f1414.

📒 Files selected for processing (2)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
🔇 Additional comments (3)
server/src/main/java/org/opensearch/index/IndexSettings.java (3)

915-915: LGTM! Field declaration is correct.

The field is appropriately declared as final since the setting is not dynamic.


1136-1136: LGTM! Field initialization is correct.

The initialization follows the standard pattern for non-dynamic IndexScope settings.


2103-2108: LGTM! Getter method is correctly implemented.

The method follows standard naming conventions and provides appropriate access to the setting value.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

❌ Gradle check result for fe0811d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

❌ Gradle check result for aaf78c5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

❌ Gradle check result for aaf78c5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

❌ Gradle check result for aaf78c5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/IndexSettings.java (1)

1136-1137: Consider using scopedSettings.get for consistency

Here you read the new setting directly from settings:

this.translogReadForward = INDEX_TRANSLOG_READ_FORWARD_SETTING.get(settings);

For consistency with nearby index-level settings, consider:

this.translogReadForward = scopedSettings.get(INDEX_TRANSLOG_READ_FORWARD_SETTING);

This keeps the access pattern uniform and future-proofs things if the setting ever becomes dynamic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aaf78c5 and f5c7303.

📒 Files selected for processing (7)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (4 hunks)
  • server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (4 hunks)
  • server/src/main/java/org/opensearch/index/translog/Translog.java (1 hunks)
  • server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1 hunks)
  • server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1 hunks)
  • server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • server/src/main/java/org/opensearch/index/translog/Translog.java
  • CHANGELOG.md
  • server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java
  • server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java
  • server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java
  • server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (3)
server/src/main/java/org/opensearch/index/IndexSettings.java (3)

915-915: Final field for read-forward flag is appropriate

Using a final boolean translogReadForward here matches the non-dynamic nature of the setting and is concurrency-safe for read-only access.


2103-2108: Accessor for translog read-forward flag is straightforward and clear

The getter and its Javadoc are aligned with the setting semantics and existing isXyzEnabled() patterns on this class.


199-218: New translog read-forward setting and Javadoc look good; verify registration in IndexScopedSettings

The setting definition and Javadoc are clear and accurately document the rare stale-operation edge case. Since this is declared with only Property.IndexScope (non-dynamic), it's fixed for the lifetime of IndexSettings, which is appropriate for this use case.

Verify that INDEX_TRANSLOG_READ_FORWARD_SETTING is registered in the appropriate IndexScopedSettings built-in set (likely BUILT_IN_INDEX_SETTINGS or similar) so that index creation and updates recognize this setting and do not report it as unknown.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

❌ Gradle check result for f5c7303: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh
Copy link
Contributor

msfroh commented Dec 9, 2025

@xuxiong1 -- Looks like the latest test failure is related to this change

org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward failed

java.lang.AssertionError: NoOp{seqNo=26, primaryTerm=41, reason='filling gaps'}
Expected: <40L>
     but: was <41L>
REPRODUCE WITH: 
./gradlew ':server:test' --tests 'org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward' -Dtests.seed=AA22BA30D3B12559 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=yue-Hans -Dtests.timezone=America/Knox_IN -Druntime.java=25

@github-actions
Copy link
Contributor

❌ Gradle check result for 3c85d10: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@xuxiong1
Copy link
Contributor Author

@xuxiong1 -- Looks like the latest test failure is related to this change

org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward failed

java.lang.AssertionError: NoOp{seqNo=26, primaryTerm=41, reason='filling gaps'}
Expected: <40L>
     but: was <41L>
REPRODUCE WITH: 
./gradlew ':server:test' --tests 'org.opensearch.indices.recovery.RecoveryTests.testRecoveryTrimsLocalTranslogWithReadForward' -Dtests.seed=AA22BA30D3B12559 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=yue-Hans -Dtests.timezone=America/Knox_IN -Druntime.java=25

Was able to reproduce with the seed.

I think the issue is from the primary replica re-sync process, the new primary would re-sync its operations (from global checkpoint to max seqNo) to the old primary (now replica), which would cause the duplicate operations (same seqNo, different term) in the old primary's translog.

The re-sync process also triggers trimOperationOfPreviousPrimaryTerms on the replica, but was passed with the max seqNo as the trimAboveSeqNo (honestly, I didn't get why we would pass the max seq instead of the start seq), so in the replica the operations between the global checkpoint to the max seqNo are not actually trimmed. Then the read forward would encounter the stale operation (read backward actually hides the issue).

The fix is to trim on replica during re-sync, starting from the global checkpoint instead of max seqNo, like how it's done in the finalizeRecovery

@github-actions
Copy link
Contributor

❌ Gradle check result for 5c8b37d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
CHANGELOG.md (1)

41-42: Deduplicate and fix the forward‑translog changelog entry

There are two identical bullets for forward translog reading, and the first one is still missing the closing ). Keep a single, correctly formatted entry:

- - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163)
- - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163))
+ - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163))
🧹 Nitpick comments (3)
server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (1)

43-76: Forward traversal + seenSeqNo now prefers older ops if duplicates slip through

The iterator setup and index/-1 sentinel look correct for both directions, and backward traversal preserves the prior behavior. One subtle change in semantics:

  • With readForward == false, we still visit newer generations first, so seenSeqNo ensures the newest op for a given seqNo “wins”.
  • With readForward == true, we now visit older generations first; if any duplicate seqNo operations remain (e.g., from incomplete trimming), seenSeqNo will keep the first/oldest op and treat later/newer ones as overridden.

The design here assumes that, under INDEX_TRANSLOG_READ_FORWARD_SETTING == true, trimOperationOfPreviousPrimaryTerms(...) guarantees that duplicates for a given seqNo have already been removed, so this reversal never becomes observable. If that invariant ever regresses, forward replay would silently favor stale operations.

Consider either:

  • Asserting in forward mode that no duplicate seqNos are seen, or
  • Adjusting the dedup strategy when readForward == true (or disabling it entirely there) so that newer operations cannot be shadowed if duplicates do appear.

Also applies to: 88-101

server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)

638-704: Forward‑read seqNo‑collision test mirrors backward case and looks correct

testSeqNoCollisionWithReadForward faithfully mirrors testSeqNoCollision while enabling INDEX_TRANSLOG_READ_FORWARD_SETTING, validating that the resync + trim logic still yields a single higher‑term op and that peer recovery only transfers non‑overridden operations with zero skipped ops. This is the right regression test for the failure described in the PR.

If you find more of these forward/backward pairs accumulating, it might be worth extracting a small helper parameterized by readForward to avoid test drift, but that’s optional here.

server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1)

3876-3934: Forward snapshot test correctly exercises read‑forward ordering

This forward‑only test mirrors testSnapshotReadOperationInReverse and validates that, with INDEX_TRANSLOG_READ_FORWARD_SETTING enabled, newSnapshot() yields operations in per‑generation insertion order (gen0, gen1, …), which is exactly what MultiSnapshot’s forward iterator is supposed to do.

One small clarity tweak you could consider: instead of wiring () -> globalCheckpoint.get() from the class field, pass a fixed supplier (e.g. () -> SequenceNumbers.NO_OPS_PERFORMED) to the LocalTranslog constructed here so the test is fully independent of any prior changes to globalCheckpoint. Behavior is unchanged today but the intent would be more explicit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c8b37d and 7ca112a.

📒 Files selected for processing (10)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (4 hunks)
  • server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java (1 hunks)
  • server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (4 hunks)
  • server/src/main/java/org/opensearch/index/translog/Translog.java (1 hunks)
  • server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1 hunks)
  • server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java (1 hunks)
  • server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1 hunks)
  • server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java
  • server/src/main/java/org/opensearch/index/translog/Translog.java
  • server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)
libs/core/src/main/java/org/opensearch/core/xcontent/MediaTypeRegistry.java (1)
  • MediaTypeRegistry (57-418)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (21, ubuntu-latest)
🔇 Additional comments (6)
server/src/main/java/org/opensearch/index/IndexSettings.java (4)

199-218: LGTM! Setting declaration is well-designed.

The setting is properly declared with comprehensive documentation. The non-dynamic nature (absence of Property.Dynamic) is appropriate, as changing translog read direction mid-operation could lead to inconsistencies. The Javadoc clearly explains the edge cases around stale operations during recovery, which aligns with the concerns raised in the PR discussion.


915-915: LGTM! Field declaration follows conventions.

The field is appropriately declared as private final, which is consistent with other non-dynamic settings in this class and ensures the value cannot be modified after initialization.


1136-1136: LGTM! Field initialization is correct.

The field is properly initialized in the constructor using the standard Setting.get(settings) pattern, consistent with other non-dynamic settings in this class.


2103-2108: LGTM! Accessor method is correctly implemented.

The method follows the class conventions with appropriate naming (isTranslogReadForward) and provides clear Javadoc. The placement near similar accessor methods like isSoftDeleteEnabled() maintains good code organization.

server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java (1)

202-205: Rolling translog generation before trimming aligns with trim invariants

Calling replica.rollTranslogGeneration() before trimOperationOfPreviousPrimaryTerms(...) is consistent with the translog trim invariants (e.g., avoiding trimming from the current writer generation) and matches how trimming is exercised in tests with forward reading. Looks good.

server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java (1)

126-142: Expecting trimAboveSeqNo == global checkpoint matches new resync semantics

Updating the assertion to equalTo(globalCheckPoint) correctly reflects the new behavior where resync trimming starts from the global checkpoint rather than maxSeqNo. This keeps the test consistent with the updated PrimaryReplicaSyncer logic.

@github-actions
Copy link
Contributor

❌ Gradle check result for 7ca112a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Signed-off-by: xuxiong1 <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
CHANGELOG.md (1)

41-42: Remove duplicated/malformed changelog entry for forward translog reading

You currently have two entries for the same feature, and the first is still missing the closing parenthesis. Keep a single, well-formed line.

- - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163)
- - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163))
+ - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163))
🧹 Nitpick comments (1)
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)

638-704: Consider refactoring to reduce duplication.

The testSeqNoCollisionWithReadForward() test is nearly identical to testSeqNoCollision() (lines 571-636), differing only in enabling the INDEX_TRANSLOG_READ_FORWARD_SETTING. While duplicating tests to cover feature flag variations is a common and acceptable pattern, you could reduce maintenance burden by extracting the shared logic into a parameterized test or helper method.

Example refactor:

@ParameterizedTest
@ValueSource(booleans = {false, true})
public void testSeqNoCollision(boolean readForward) throws Exception {
    Settings.Builder settingsBuilder = Settings.builder()
        .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true)
        .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1")
        .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1");
    if (readForward) {
        settingsBuilder.put(IndexSettings.INDEX_TRANSLOG_READ_FORWARD_SETTING.getKey(), true);
    }
    try (ReplicationGroup shards = createGroup(2, settingsBuilder.build())) {
        // ... shared test logic ...
    }
}

That said, the current approach is perfectly acceptable and may be preferred if you want each scenario to be independently debuggable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ca112a and 79c1e0a.

📒 Files selected for processing (10)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (4 hunks)
  • server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java (1 hunks)
  • server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (4 hunks)
  • server/src/main/java/org/opensearch/index/translog/Translog.java (1 hunks)
  • server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1 hunks)
  • server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java (1 hunks)
  • server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1 hunks)
  • server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/src/main/java/org/opensearch/index/translog/Translog.java
  • server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: Analyze (java)
🔇 Additional comments (6)
server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java (1)

184-205: Rolling translog generation before trimming looks correct and matches trim invariants

Calling replica.rollTranslogGeneration() before trimOperationOfPreviousPrimaryTerms ensures no mixed-term operations remain in the current writer when trimming runs, which matches the LocalTranslog trim invariants and avoids stale previous-primary-term ops when forward-reading the translog. The sequencing here (apply ops → roll → trim) looks sound.

server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (1)

43-45: Iterator-based bidirectional traversal correctly preserves snapshot and dedup semantics

The new PrimitiveIterator + index state cleanly iterates snapshots in either 0..n−1 or n−1..0 order, fully exhausting each TranslogSnapshot before advancing, while reusing the existing seenSeqNo dedup logic. This fixes the earlier “one op per snapshot” issue and looks correct for both forward and backward reads.

Also applies to: 57-76, 88-101

server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java (1)

319-359: Using startingSeqNo - 1 as trimAboveSeqNo correctly aligns trimming with the global checkpoint

Deriving trimAboveSeqNo from startingSeqNo - 1 (i.e., the last known global checkpoint) and always sending the first request even when there are no ops ensures replicas trim previous-primary-term operations for all seqNos above the global checkpoint, fixing the earlier “trim from max seqNo only” gap without affecting later batches.

server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1)

533-572: Forward-read recovery trimming test mirrors the existing reverse-read coverage

testRecoveryTrimsLocalTranslogWithReadForward is a straightforward forward-read clone of testRecoveryTrimsLocalTranslog, correctly wiring INDEX_TRANSLOG_READ_FORWARD_SETTING and asserting Lucene/translog consistency and history equality after promotion/demotion cycles. Looks good.

server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1)

3876-3934: Forward snapshot test correctly validates generation-order iteration

testSnapshotReadOperationForward creates a dedicated translog with INDEX_TRANSLOG_READ_FORWARD_SETTING enabled, writes multiple generations while tracking per-generation ops, and then asserts that newSnapshot() yields the concatenated gen0→genN sequence. This is the right assertion for the forward-read path and cleanly exercises the new iteration logic.

server/src/main/java/org/opensearch/index/IndexSettings.java (1)

199-218: Excellent documentation for the new setting.

The comprehensive Javadoc clearly explains the behavior, default semantics, and edge case risks associated with forward reading. The choice to make this a non-dynamic index-scoped setting (cannot be changed after index creation) is appropriate given the complexity of the feature and the edge cases mentioned.

@github-actions
Copy link
Contributor

❕ Gradle check result for 78bc921: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@msfroh msfroh merged commit 1aed472 into opensearch-project:main Dec 10, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] read forward in translog

3 participants